Mel sub-band filtering and compression for robust speech recognition
نویسندگان
چکیده
The Mel-frequency cepstral coefficients (MFCC) are commonly used in speech recognition systems. But, they are high sensitive to presence of external noise. In this paper, we propose a noise compensation method for Mel filter bank energies and so MFCC features. This compensation method is performed in two stages: Mel sub-band filtering and then compression of Mel-sub-band energies. In the compression step, we propose a sub-band SNRdependent compression function. We use this function in place of logarithm function in conventional MFCC feature extraction in presence of additive noise. Results show that the proposed method significantly improves MFCC features performance in noisy conditions where it decreases average word error rate up to 30% for isolated word recognition on three test sets of Aurora 2 database.
منابع مشابه
A framework for robust MFCC feature extraction using SNR-dependent compression of enhanced mel filter bank energies
The Mel-frequency cepstral coefficients (MFCC) are most widely used and successful features for speech recognition. But, their performance degrades in presence of additive noise. In this paper, we propose a noise compensation method for Mel filter bank energies and so MFCC features. This compensation method includes two steps: Mel sub-band spectral subtraction and then compression of Mel-Sub-ba...
متن کاملMaximum likelihood sub-band adaptation for robust speech recognition
Noise-robust speech recognition has become an important area of research in recent years. In current speech recognition systems, the Mel-frequency cepstrum coefficients (MFCCs) are used as recognition features. When the speech signal is corrupted by narrow-band noise, the entire MFCC feature vector gets corrupted and it is not possible to exploit the frequency-selective property of the noise si...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملTime and frequency filtering of filter-bank energies for robust HMM speech recognition
Every speech recognition system requires a signal representation that parametrically models the temporal evolution of the speech spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those energies is performed in diverse ways, but it always includes smoothing o...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کامل